A Short self paced Spark course

Watch the video

This is a 10 hours course to teach the basics of Spark to students who already are familiar with coding and machine learning.

The examples will all be given in python.

References

[SDG] The book: Spark the Definitive Guide
[2] https://luminousmen.com/post/spark-tips-partition-tuning

Running the example code

Either use the free community edition of Databricks (https://community.cloud.databricks.com/)

or run locally on your PC (instructions are provided for linux/windows/Mac)

Focus of this course

understand the concepts
practice simple operations
get basic familiarity with configuration and tuning
run simple machine learning models

What is Spark?

What is horizontal scaling and vertical scaling?

Apache Spark is an open-source cluster computing framework.

Built on top of Hadoop MapReduce.

Utilizes In-memory computing.

Originally developed at UC Berkeley (2009).

Where to run my Spark server?

In a real production environment, Databricks managed cluster can be used (in the cloud), or MS HDInsight. We can also install our own Spark cluster, locally or in the cloud. The number of computers can reach thousands in the cluster.

During this course we will use a minimal installation on your own PC/Mac/linux machine.

Instructions are here: https://github.com/cnoam/spark-course/blob/master/readme.md

Where are the video recordings

https://panoptotech.cloud.panopto.eu/Panopto/Pages/Sessions/List.aspx?folderID=a2ea87f6-ac49-4444-b9bd-afa800a4f0c3

While developing: on my laptop "~/videos/spark videos"

Check yourself

Occasionally, you will have opportunities to check your knowledge. Try to answer/solve/execute all the questions. They will help you make sure you are ready for the next part!

Explain the difference between horizontal and vertical scaling
Check "Cluster" definition. Does it match what we have in Spark?

Technical details on recording: Ubuntu 22.04 kazam full screen recording use extension manager to hide desktop icons

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0 Welcome.md

0 Welcome.md

A Short self paced Spark course

Read more

References

Running the example code

Focus of this course

What is Spark?

Where to run my Spark server?

Where are the video recordings

Check yourself

Files

0 Welcome.md

Latest commit

History

0 Welcome.md

File metadata and controls

A Short self paced Spark course

Read more

References

Running the example code

Focus of this course

What is Spark?

Where to run my Spark server?

Where are the video recordings

Check yourself